Assigned tokens gave different synthesis results:

title

Figure 1. Visualization of the spectrogram of 5 tokens' synthesis results trained on an internal dataset. From top to bottom is 1 to 5.

Token1: 
Token2: 
Token3: 
Token4: 
Token5: 

Assigned tokens gave different speaker voice as synthesis results:

title Figure 2. Visualization of the spectrogram of 5 tokens' synthesis results trained on VCTK dataset. From top to bottom is 1 to 5.

Token1: 
Token2: 
Token3: 
Token4: 
Token5: 

Assigned tokens gave different synthesis results:

title

Figure 3. Visualization of the spectrogram of 5 tokens' synthesis results trained on Blizzard2013 dataset. From top to bottom is 1 to 5.

Token1: 
Token2: 
Token3: 
Token4: 
Token5: 

Prosody Transfer

Parallel utterance

The following shows three example of prosody transfer synthesis.

In each example, text of the utterance to synthesis is the same as the reference's. The first utterance shown in each example is the reference. The second one is the synthesis results using neutral prosody. The third one is the prosody transfer result.

ref
neutral_16bitPCM
prosodyT_16bitPCM
ref
neutral_16bitPCM
prosodyT_16bitPCM
ref
neutral_16bitPCM
prosodyT_16bitPCM

Unparallel utterance

The following shows three example of unparallel prosody transfer synthesis.

In each example, text of the utterance to synthesis is different from the reference's. The first utterance shown in each example is the reference. The second and third ones are two prosody transfer synthesis results with different text contents.

ref
transfer_16bitPCM
transfer2_16bitPCM
ref
transfer_16bitPCM
transfer2_16bitPCM
ref
transfer_16bitPCM
transfer2_16bitPCM